Client Report - What’s in a Name?

Course DS 250

Author

jaison jonnakuti

Show the code
import pandas as pd
import numpy as np
from lets_plot import *

LetsPlot.setup_html(isolated_frame=True)

Project Notes

For Project 1 the answer to each question should include a chart and a written response. The years labels on your charts should not include a comma. At least two of your charts must include reference marks.

Show the code
# Learn morea about Code Cells: https://quarto.org/docs/reference/cells/cells-jupyter.html

# Include and execute your code here
df = pd.read_csv("https://github.com/byuidatascience/data4names/raw/master/data-raw/names_year/names_year.csv")

QUESTION|TASK 1

How does your name at your birth year compare to its use historically?

type your results and analysis here

Show the code
# Include and execute your code here
name_oliver = df.query("name == 'oliver'")

chart_oliver = (ggplot(name_oliver, aes('year', 'Total')) +
             geom_line(color='blue') +
             ggtitle("Historical Usage of the Name 'oliver'") +
             xlab('Year') +
             ylab('Count') +
             geom_hline(yintercept=0, linetype='dotted', color='red') +
             theme(axis_text_x=element_text(angle=45, hjust=1)))

chart_oliver.show()

QUESTION|TASK 2

If you talked to someone named Brittany on the phone, what is your guess of his or her age? What ages would you not guess?

type your results and analysis here

Show the code
# Include and execute your code here
# label: Brittany Graph
# code-summary:Read and format data
# fig-cap: "Name popularity for 'Brittany' with reference line"
# fig-align: center
# Include and execute your code here


brittany_df = df.query("name == 'Brittany'")

brittany_chart = (
    ggplot(brittany_df, aes('year', 'Total')) + 
    geom_line(color='blue') + 
    ggtitle("Brittany Graph") +
    geom_vline(xintercept=1990, linetype='dashed', color='red') +
    xlab("Year") +
    ylab("Total Count") +
    theme(axis_text_x=element_text(angle=45, hjust=1))
)

brittany_chart.show()

QUESTION|TASK 3

Mary, Martha, Peter, and Paul are all Christian names. From 1920 - 2000, compare the name usage of each of the four names in a single chart. What trends do you notice?

type your results and analysis here

Show the code
# Include and execute your code here
biblical_names = ['Mary', 'Martha', 'Peter', 'Paul']
biblical_data = df.query("name in @biblical_names and 1920 <= year <= 2000")

chart_biblical = (ggplot(biblical_data, aes('year', 'Total', color='name')) +
                  geom_line() +
                  ggtitle("Trends of Biblical Names (1920-2000)") +
                  xlab('Year') +
                  ylab('Count') +
                  theme(axis_text_x=element_text(angle=45, hjust=1)))

chart_biblical.show()

QUESTION|TASK 4

Think of a unique name from a famous movie. Plot the usage of that name and see how changes line up with the movie release. Does it look like the movie had an effect on usage?

type your results and analysis here

Show the code
# Include and execute your code here
name_peter = df.query("name == 'Peter'")

chart_peter = (ggplot(name_peter, aes('year', 'Total')) +
               geom_line(color='green') +
               geom_vline(xintercept=2002, linetype='dotted', color='red') +
               geom_vline(xintercept=2012, linetype='dotted', color='blue') +
               geom_vline(xintercept=2017, linetype='dotted', color='orange') +
               ggtitle("Trends for the Name 'Peter' with Movie Releases") +
               xlab('Year') +
               ylab('Count') +
               theme(axis_text_x=element_text(angle=45, hjust=1),
                     axis_text_y=element_text(size=8)))

chart_peter.show()

STRETCH QUESTION|TASK 1

Reproduce the chart Elliot using the data from the names_year.csv file.

type your results and analysis here

Show the code
# Include and execute your code here
name_elliot = df.query("name == 'Elliot' and year >= 1950")

chart_elliot = (ggplot(name_elliot, aes('year', 'Total')) +
                geom_line(color='orange') +
                geom_vline(xintercept=1982, linetype='dashed', color='red') +
                geom_vline(xintercept=1988, linetype='dashed', color='red') +
                geom_vline(xintercept=2002, linetype='dashed', color='red') +
                ggtitle("Trends for the Name 'Elliot' with Movie Milestones") +
                xlab('Year') +
                ylab('Count') +
                theme(axis_text_x=element_text(angle=45, hjust=1),
                      axis_text_y=element_text(size=8)))

chart_elliot.show()